Summary of Pearl 2014

Lord’s paradox

Background

Two statisticians are interested in determining whether there are differences among males and females in weight gain over the course of semester. In the original depiction by Lord, there is an implication of some diet treatment, but all that can be assumed is that those under consideration both received the ‘treatment’ if there was one. The two statisticians take different approaches to examining the data, yet come to different conclusions.

Variables

  • sex
  • weight time 1
  • weight time 2
  • weight change

Issues

The following graph is from Pearl 2014. The ellipses represent the scatterplots for boys and girls. The diagonal 45o degree line would represent no change from time 1 to time 2. The center of the ellipses are both on this line, and thus the mean change for boys and girls are identical and zero. The density plot in the lower left depicts the distribution of change scores centered on this zero estimate.

Results

Statistician 1 focuses on change scores, while statistician 2 uses an ancova to examine sex differences at time 2 while adjusting for initial weight.

  • t-test for group difference on change score vs. ancova approach
  • Statistician 1 concludes no difference in change
  • Statistician 2 concludes a difference in time 2 if controlling for time 1

DAG

The model can be depicted as a directed acyclic graph as follows.

In the above we can define the following effects of sex on weight:

  • Indirect effect: is a*c
  • Direct effect: b
  • Total effect: b + a*c - a

Gain is completely determined as the difference between final weight and initial weight.

Who is correct?

  • Both statisticians are correct
  • t-test on change = total effect
  • ancova = direct effect

In summary, the two statisticians are focused on different effects from the same model.

Data Example

We can get an explicit sense of the results by means of a hands on example. In the following we have simulated data that will reproduce the situation described thus far. Parameters were chosen for visual effect. One thing that’s not been noted about the example is that it likely would not occur. The total effect by definition would be larger than the direct effect as there would be strong sex differences at initial weight, and a strong correlation between initial and final weight, all in a positive manner. The other issue unadressed is that the entire focus is on whether an effect exists hinges on p-values, and with large enough data and such simple models, anything would flag significant. Plus there are issues of variance, nonlinear relations etc.

library(dplyr)
set.seed(1234)
N = 200
group  = rep(c(0, 1), e=N/2)
pre = .75*group + rnorm(N, sd=.25)
post = .4*pre + .5*group + rnorm(N, sd=.1)
change = post-pre
df = data.frame(id=factor(1:N), group=factor(group, labels=c('Female', 'Male')), pre, post, change)
dflong = tidyr::gather(df, key=time, value=score, pre:post) %>% arrange(id)
head(df)
  id  group         pre        post        change
1  1 Female -0.30176644 -0.07218389  0.2295825445
2  2 Female  0.06935731  0.09741980  0.0280624915
3  3 Female  0.27111029  0.12699551 -0.1441147849
4  4 Female -0.58642443 -0.16449642  0.4219280069
5  5 Female  0.10728117  0.07408057 -0.0332006005
6  6 Female  0.12651397  0.12665183  0.0001378524
head(dflong)
  id  group      change time       score
1  1 Female  0.22958254  pre -0.30176644
2  1 Female  0.22958254 post -0.07218389
3  2 Female  0.02806249  pre  0.06935731
4  2 Female  0.02806249 post  0.09741980
5  3 Female -0.14411478  pre  0.27111029
6  3 Female -0.14411478 post  0.12699551

In the following we’ll use lavaan to estimate the full mediation model, then run separate regressions to demonstrate the t-test on change vs. the ancova approach. As noted above, the t-test on change score measures the total effect of sex on final weight, while the ancova measures the direct effect. It is unnecessary to distinguish them as separate modeling approaches, as they are merely regressions with different target variables.

mod = "
  pre ~ a*group
  post ~ b*group + c*pre
  # change ~ -1*pre + 1*post
 
  # total effect
  TE := (b + a*c) - a
"

library(lavaan)
lpmod = sem(mod, data=df)

summary(lpmod)
lavaan (0.5-20) converged normally after  28 iterations

  Number of observations                           200

  Estimator                                         ML
  Minimum Function Test Statistic                0.000
  Degrees of freedom                                 0
  Minimum Function Value               0.0000000000000

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Regressions:
                   Estimate  Std.Err  Z-value  P(>|z|)
  pre ~                                               
    group      (a)    0.800    0.036   22.317    0.000
  post ~                                              
    group      (b)    0.447    0.026   17.028    0.000
    pre        (c)    0.445    0.028   16.043    0.000

Variances:
                   Estimate  Std.Err  Z-value  P(>|z|)
    pre               0.064    0.006   10.000    0.000
    post              0.010    0.001   10.000    0.000

Defined Parameters:
                   Estimate  Std.Err  Z-value  P(>|z|)
    TE                0.004    0.024    0.165    0.869
summary(lm(change~group, df))  # t-test on change scores = total effect

Call:
lm(formula = change ~ group, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5174 -0.1133  0.0208  0.1310  0.4612 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.038975   0.017300   2.253   0.0254 *
groupMale   0.004028   0.024466   0.165   0.8694  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.173 on 198 degrees of freedom
Multiple R-squared:  0.0001369, Adjusted R-squared:  -0.004913 
F-statistic: 0.02711 on 1 and 198 DF,  p-value: 0.8694
summary(lm(post~group+pre, df)) # 'ancova' uncovers direct effect etc.

Call:
lm(formula = post ~ group + pre, data = df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.315819 -0.066341  0.005835  0.060977  0.268154 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.01724    0.01008    1.71   0.0888 .  
groupMale    0.44744    0.02648   16.90   <2e-16 ***
pre          0.44538    0.02797   15.92   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1002 on 197 degrees of freedom
Multiple R-squared:  0.9463,    Adjusted R-squared:  0.9457 
F-statistic:  1734 on 2 and 197 DF,  p-value: < 2.2e-16

Aside

Use change score adjusting for initial weight. Note that this duplicates the ancova result for the group effect (i.e. it is the direct effect). But note that the coefficient for initial weight is equivalent to the ancova coefficient -1. See for example, Senn 2006.


Call:
lm(formula = change ~ group + pre, data = df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.315819 -0.066341  0.005835  0.060977  0.268154 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.01724    0.01008    1.71   0.0888 .  
groupMale    0.44744    0.02648   16.90   <2e-16 ***
pre         -0.55462    0.02797  -19.83   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1002 on 197 degrees of freedom
Multiple R-squared:  0.6662,    Adjusted R-squared:  0.6628 
F-statistic: 196.6 on 2 and 197 DF,  p-value: < 2.2e-16

Treatment with confounding

Background

Wainer & Brown 2007 took a different interpretation of the paradox

Variables

  • weight time 1
  • weight time 2
  • weight change
  • table A vs. B

Issues

*The choice of comic sans font in the graph due to Wainer and Brown and should be held against them.

  • Heavier kids more likely to sit at table B
  • Two statisticians come the conclusions as before

Results

  • Weight time 1 is now a confounder
  • Arrow from time 1 to ‘treatment’
  • Statistician 1 is incorrect because they do not adjust
  • Statistician 2

Birth Weight Paradox

Backgroundo

Variables

  • birthweight
  • smoking mom
  • infant mortality

Issues

  • No difference score
  • Before, focus on clash between two seemingly legitimate methods of analysis
  • Now using ancova but results seem implausible

Results

  • low birthweight children have higher mortality rate (100 fold higher)
  • children of smoking mothers notably more likely to have low birghtweight
  • low birthweight children born to smoking mothers have a lower mortality rate
  • Conclusion: expectant mothers should start smoking!

Collider bias (explain away effect)

Explanation

Perspective 1

  • Controlling just for smoking leaves other causes, resulting in bias
  • Controlling for smoking changes the probability of other causes (due to BW collider) for any stratum of BW
  • Example: for BW=’low, if we compare smoking vs. non-smoking we are also comparing rare other causes vs. likely other causes

Perspective 2

  • Mediation context of previous: we want to know the mortality rate of babies e.g. smokers vs. non if BW controlled for
  • However here we have confounders, whereas before, the fundamental assumption was that there weren’t any.
  • Ajusting for BW doesn’t sever all paths traversing the mediator, and actually opens up a new path, and the effect is now spurious
  • For low BW, comparison of smoking vs. non-smoking compares no other causes vs. other causes

Simpson’s paradox more generally

Pearl 2013